{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "14d3fd06",
   "metadata": {
    "id": "14d3fd06"
   },
   "source": [
    "# Hybrid Search\n",
    "\n",
    "The standard search in LangChain is done by vector similarity. However, a number of vectorstores implementations (Astra DB, ElasticSearch, Neo4J, AzureSearch, ...) also support more advanced search combining vector similarity search and other search techniques (full-text, BM25, and so on). This is generally referred to as \"Hybrid\" search.\n",
    "\n",
    "**Step 1: Make sure the vectorstore you are using supports hybrid search**\n",
    "\n",
    "At the moment, there is no unified way to perform hybrid search in LangChain. Each vectorstore may have their own way to do it. This is generally exposed as a keyword argument that is passed in during `similarity_search`. By reading the documentation or source code, figure out whether the vectorstore you are using supports hybrid search, and, if so, how to use it.\n",
    "\n",
    "**Step 2: Add that parameter as a configurable field for the chain**\n",
    "\n",
    "This will let you easily call the chain and configure any relevant flags at runtime. See [this documentation](/docs/how_to/configure) for more information on configuration.\n",
    "\n",
    "**Step 3: Call the chain with that configurable field**\n",
    "\n",
    "Now, at runtime you can call this chain with configurable field.\n",
    "\n",
    "## Code Example\n",
    "\n",
    "Let's see a concrete example of what this looks like in code. We will use the Cassandra/CQL interface of Astra DB for this example.\n",
    "\n",
    "Install the following Python package:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c2efe35eea197769",
   "metadata": {
    "id": "c2efe35eea197769",
    "outputId": "527275b4-076e-4b22-945c-e41a59188116"
   },
   "outputs": [],
   "source": [
    "!pip install \"cassio>=0.1.7\""
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b4ef96d44341cd84",
   "metadata": {
    "collapsed": false,
    "id": "b4ef96d44341cd84"
   },
   "source": [
    "Get the [connection secrets](https://docs.datastax.com/en/astra/astra-db-vector/get-started/quickstart.html).\n",
    "\n",
    "Initialize cassio:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "cb2cef097277c32e",
   "metadata": {
    "id": "cb2cef097277c32e",
    "outputId": "4c3d05a0-319a-44a0-8ec3-0a9c78453132"
   },
   "outputs": [],
   "source": [
    "import cassio\n",
    "\n",
    "cassio.init(\n",
    "    database_id=\"Your database ID\",\n",
    "    token=\"Your application token\",\n",
    "    keyspace=\"Your key space\",\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e1e51444877f45eb",
   "metadata": {
    "collapsed": false,
    "id": "e1e51444877f45eb"
   },
   "source": [
    "Create the Cassandra VectorStore with a standard [index analyzer](https://docs.datastax.com/en/astra/astra-db-vector/cql/use-analyzers-with-cql.html). The index analyzer is needed to enable term matching."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "7345de3c",
   "metadata": {
    "id": "7345de3c",
    "outputId": "d38bcee0-0134-4ac6-8d35-afcce282481b"
   },
   "outputs": [],
   "source": [
    "from cassio.table.cql import STANDARD_ANALYZER\n",
    "from langchain_community.vectorstores import Cassandra\n",
    "from langchain_openai import OpenAIEmbeddings\n",
    "\n",
    "embeddings = OpenAIEmbeddings()\n",
    "vectorstore = Cassandra(\n",
    "    embedding=embeddings,\n",
    "    table_name=\"test_hybrid\",\n",
    "    body_index_options=[STANDARD_ANALYZER],\n",
    "    session=None,\n",
    "    keyspace=None,\n",
    ")\n",
    "\n",
    "vectorstore.add_texts(\n",
    "    [\n",
    "        \"In 2023, I visited Paris\",\n",
    "        \"In 2022, I visited New York\",\n",
    "        \"In 2021, I visited New Orleans\",\n",
    "    ]\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "73887f23bbab978c",
   "metadata": {
    "collapsed": false,
    "id": "73887f23bbab978c"
   },
   "source": [
    "If we do a standard similarity search, we get all the documents:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3c2a39fa",
   "metadata": {
    "id": "3c2a39fa",
    "outputId": "5290085b-896c-4c81-9b40-c315331b7009"
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[Document(page_content='In 2022, I visited New York'),\n",
       "Document(page_content='In 2023, I visited Paris'),\n",
       "Document(page_content='In 2021, I visited New Orleans')]"
      ]
     },
     "execution_count": null,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "vectorstore.as_retriever().invoke(\"What city did I visit last?\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "78d4c3c79e67d8c3",
   "metadata": {
    "collapsed": false,
    "id": "78d4c3c79e67d8c3"
   },
   "source": [
    "The Astra DB vectorstore `body_search` argument can be used to filter the search on the term `new`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "56393baa",
   "metadata": {
    "id": "56393baa",
    "outputId": "d1c939f3-342f-4df4-94a3-d25429b5a25e"
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[Document(page_content='In 2022, I visited New York'),\n",
       "Document(page_content='In 2021, I visited New Orleans')]"
      ]
     },
     "execution_count": null,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "vectorstore.as_retriever(search_kwargs={\"body_search\": \"new\"}).invoke(\n",
    "    \"What city did I visit last?\"\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "88ae97ed",
   "metadata": {
    "id": "88ae97ed"
   },
   "source": [
    "We can now create the chain that we will use to do question-answering over"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "62707b4f",
   "metadata": {
    "id": "62707b4f"
   },
   "outputs": [],
   "source": [
    "from langchain_core.output_parsers import StrOutputParser\n",
    "from langchain_core.prompts import ChatPromptTemplate\n",
    "from langchain_core.runnables import (\n",
    "    ConfigurableField,\n",
    "    RunnablePassthrough,\n",
    ")\n",
    "from langchain_openai import ChatOpenAI"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b6778ffa",
   "metadata": {
    "id": "b6778ffa"
   },
   "source": [
    "This is basic question-answering chain set up."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "44a865f6",
   "metadata": {
    "id": "44a865f6"
   },
   "outputs": [],
   "source": [
    "template = \"\"\"Answer the question based only on the following context:\n",
    "{context}\n",
    "Question: {question}\n",
    "\"\"\"\n",
    "prompt = ChatPromptTemplate.from_template(template)\n",
    "\n",
    "model = ChatOpenAI()\n",
    "\n",
    "retriever = vectorstore.as_retriever()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "72125166",
   "metadata": {
    "id": "72125166"
   },
   "source": [
    "Here we mark the retriever as having a configurable field. All vectorstore retrievers have `search_kwargs` as a field. This is just a dictionary, with vectorstore specific fields"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "babbadff",
   "metadata": {
    "id": "babbadff"
   },
   "outputs": [],
   "source": [
    "configurable_retriever = retriever.configurable_fields(\n",
    "    search_kwargs=ConfigurableField(\n",
    "        id=\"search_kwargs\",\n",
    "        name=\"Search Kwargs\",\n",
    "        description=\"The search kwargs to use\",\n",
    "    )\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2d481b70",
   "metadata": {
    "id": "2d481b70"
   },
   "source": [
    "We can now create the chain using our configurable retriever"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "210b0446",
   "metadata": {
    "id": "210b0446"
   },
   "outputs": [],
   "source": [
    "chain = (\n",
    "    {\"context\": configurable_retriever, \"question\": RunnablePassthrough()}\n",
    "    | prompt\n",
    "    | model\n",
    "    | StrOutputParser()\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a38037b2",
   "metadata": {
    "id": "a38037b2",
    "outputId": "1ea14996-5965-4a5e-9678-b9c35ce5c6de"
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Paris"
      ]
     },
     "execution_count": null,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "chain.invoke(\"What city did I visit last?\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7f6458c3",
   "metadata": {
    "id": "7f6458c3"
   },
   "source": [
    "We can now invoke the chain with configurable options. `search_kwargs` is the id of the configurable field. The value is the search kwargs to use for Astra DB."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9gYLqBTH8BFz",
   "metadata": {
    "id": "9gYLqBTH8BFz",
    "outputId": "4358a2e6-f306-48f1-dd5c-781ac8a33e89"
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "New York"
      ]
     },
     "execution_count": null,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "chain.invoke(\n",
    "    \"What city did I visit last?\",\n",
    "    config={\"configurable\": {\"search_kwargs\": {\"body_search\": \"new\"}}},\n",
    ")"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.1"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
